# To conserve interactive plots when downloading the notebook as HTML
# %%HTML
# <script src="require.js"></script>
import sys, os
from importlib import reload
import typing, requests, json, logging
reload(logging)
logging.basicConfig(stream=sys.stdout, level=logging.INFO,
format='%(asctime)s [%(levelname)s] %(name)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S',
)
from IPython.display import Markdown, display
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)
from datetime import datetime
import numpy as np
import plotly.express as px
from plotly.missing_ipywidgets import FigureWidget
import seaborn as sns
def save_html_plot(
fig: FigureWidget, path: str
) -> None:
fig.write_html(
path,
config={
"displaylogo": False,
},
include_plotlyjs="cdn",
)
OUTPUTS_PATH = "outputs/"
api_keys_dict = json.load(open("api_keys.json", "r"))
# print("api_keys.json contents:", api_keys_dict)
ETHERSCAN_API_KEY = api_keys_dict["etherscan"]
BITQUERY_API_KEY = api_keys_dict["bitquery"]
if ETHERSCAN_API_KEY == "1234abcdf" or BITQUERY_API_KEY == "1234abcdf":
raise ValueError("Fill the api_keys.json file with your API keys before running this notebook!")
2022-11-02 13:34:52 [INFO] numexpr.utils - NumExpr defaulting to 8 threads.
Author: Dani Salgado Rojo
In this tutorial you will learn the basic notions and knowledge about the Blockchain technology as well as some practical examples about retrieving and visualizing data from a blockchain via public APIs with Python.
We will focus mostly on the Ethereum blockchain.
For any comments about the tutorial (suggestions for improvement, typos, error reports, etc), please contact me at daniel.salgado.rojo@gmail.com.
Some sections include at the end some additional resources. If marked as (Optional), you may skip them and continue reading unless you need to clarify more the concepts; If marked as (Recommended), I strongly encourage you reading the resource.
I also recommend you doing the Exercises proposed at the end of Section 3.
Before you continue reading this tutorial, please, start watching two particular videos from YouTube.
Blockchain (or chain of blocks) is a technology to record changes (transactions) over time in a non-destructive way that became known for its use around cryptocurrencies, but its scope goes beyond them. This technology is used like a scribe's book, where each event or modification of the data is written as a new block of a chain and in this way, there is a settled, certified record and its integrity and availability are guaranteed. If this content is also encrypted, it guarantees confidentiality. This unique and unalterable registry of data is distributed in several nodes of a decentralized network where each block of the chain stores information about itself, about valid transactions of some kind, and about its relationship with the previous and next block of the chain. The figure below shows a simplified visual representation of the blockchain transactions process. See the figure in section 1.2 to see an abstract representation of a blockchain.

Although, as we said, the use of blockchain is strongly associated with cryptocurrencies, this technology can be used for other types of digital assets, such as NFTs, smart contracts, tokens, etc. In addition, in recent years, blockchain-based solutions have had a very important growth, being adopted by industries such as finance to improve the security and efficiency of existing products and services, as well as by the food industry and the supply chain for the product tracking or in the scientific community for documentation management.
Going back, blockchain technology allows transactions to be recorded and an asset to be tracked within the network without the need for intermediaries (decentralization), since the thousands or millions of nodes (e.g., computer machines from different authorized persons and entities) that make up the network are the ones in charge of verifying and validating those transactions that will be recorded in a new block that is added to the chain. To accomplish this distributed agreement (validation and verification), blockchains need a consensus mechanism.

The (regular) users that want to operate on a blockchain (via transactions, for instance to trade cryptocurrencies with other users) need to have an "account" associated to that blockchain in particular. In a few words, a user is uniquely identified in the blockchain via a public key (which to be simple, would be like a "bank account address" or "your email" that anyone can see) which can be used by other users, for instance, to send some cryptocurrencies to the account (address) associated with that public key. Also, each user must have a secret key (a code like a "PIN" or "password") paired with the public key, which is what uniquely proves that you are the owner of the cryptocurrency received in the transaction. In contrast, "miners" are authorized users that provide their computational power (computers) to conform a particular type of nodes that participate in the validation of new Blocks or transactions.
There are different types of blockchain networks: there are private, public, federated and hybrid networks. Depending on the characteristics of each project or sector, each organization will choose the one that best suits its needs among the different types of blockchain. Although public blockchains, such as Bitcoin or Ethereum, are the best known, many companies have become interested in the use of private blockchains to protect sensitive information. In this sense, there are companies offering blockchain as a service (BaaS), which allows other organizations to create and use applications based on blockchain through a cloud infrastructure.

There are different types of cryptocurrencies. Each of them is used in a similar way to traditional money, since they serve to transfer value or as a means of exchange. While some people often refer to cryptocurrency (or "cryptocoin") as a synonym for token, there are some differences. On the one hand, a cryptocurrency is created within its own native blockchain network (e.g., the native currency of the Ethereum blockchain is the Ether (ETH)) while tokens are created using existing blockchain protocols (e.g., CryptoKitties is an example of a token-based (NFTs) game which is built on top of the Ethereum blockchain).
In the case of cryptocurrencies, we can think of blockchain as the ledger where each transaction is recorded. Its operation can be complex to understand if we delve into the internal details of its implementation, but the basic idea is easy to follow.
In each block it is stored:
Therefore, each block has a specific and immovable place within the chain, since each block contains information (hash) of the previous block. The complete chain is stored in each node of the network that makes up the blockchain, so an exact copy of the chain is stored in all the nodes that participate in the network.
As new records are created, they are first verified and validated by the nodes in the network and then added to a new block that is linked to the chain.
Illustration of a blockchain of Three blocks surrounded by what would represent the network of nodes that save a copy of the information stored in the blockchain and work in validating transactions and operations within the blockchain
To recap:
We can think of a blockchain as database that is updated and shared across many computers in a network.
"Block" refers to data and state being stored in consecutive groups known as "blocks". (e.g., in the Ethereum blockchain, if you send ETH to someone else, the transaction data needs to be added to a block to be successful).
"Chain" refers to the fact that each block cryptographically references its parent. In other words, blocks get chained together. The data in a block cannot change without changing all subsequent blocks, which would require the consensus of the entire network.
Every computer in the network must agree upon each new block and the chain as a whole. These computers are known as "nodes". Nodes ensure everyone interacting with the blockchain has the same data. To accomplish this distributed agreement, blockchains need a consensus mechanism.
From now on in this tutorial, we will focus on public blockchains which in most of the cases have a native cryptocurrency ("coin") associated with it, among many other tokens (projects build on top of the blockchain that provide new 'digital assets' which are completely different from what the "coin" or 'native asset' of the blockchain is).
The elements that make up and contribute to the blockchain are its ecosystem. Elements that you will find in any blockchain ecosystem are:

Although those are the core elements of a blockchain ecosystem, they are often only the beginning. Starting with Ethereum, many modern blockchains have launched with the capability of running smart contracts. A smart contract is a self-executing blockchain contract (code), like a program that runs on a blockchain.
Smart contracts dramatically expand blockchain technology capabilities, and they introduce a new group in a blockchain ecosystem: the projects built on that blockchain. Here are just a few of the types of projects that can be created on a smart contract blockchain:
The internet has completely changed the game for the modern person, it changed the way we live, work and socialize and it is still shifting gears for us; it brought about a revolution that turned the world into a global village. The internet has previously experienced 2 shifts, with a 3rd on its way. Web 1.0 (Web1), Web 2.0 (Web2), and Web 3.0 (Web3) came to limelight in 1996, 2006, and 2016 respectively; each intended to be an upgrade from the previous and an upgrade it sure has been.

The first shift was Web1 otherwise known as the "Wold Wide Web", the first-generation internet which gave us a read-only access to platforms, a marvel of engineering at the time. Built on HTML technology, it provided web pages for companies to put out information on websites to an audience reading up to date news any day, anytime.
In Web1, Internet users had a fairly limited role: the consumption of information that was hosted on computer servers.
Web2 came to prominence with the so-called "dotcom bubble era". Web2 allowed users to modify designs and code in more detail, but it also meant the corporatization of the Internet.
In this way, Web2 encourages interaction between the Web and users and the Internet becomes a collaborative platform in which all users can participate (blogs, social networks,...), giving them a greater voice than with Web1, but with the disadvantage that this Web is actually dominated by companies offering services in exchange for personal data. By registering to a web page or accepting cookies, we are giving personal information to the owners of those pages for free, and they sell it to other corporations which then use it, for instance, to create customized ads and marketing.
Thus follows Web 3, the overarching goal of this iteration of the Web is to make the internet a lot more intelligent, autonomous, and open. To make the internet akin to real-life, utilizing artificial-intelligence, secure transfer of value using blockchains, 3D graphics, virtual reality, and augmented reality.
Web3 is intended to be a pro-privacy and anti-monopoly Web; this involves eliminating middlemen like the government or entities who have been in control of user data in Web2. Decentralizing the web will give users complete control of data and security encryption. Blockchains will be accessible to everyone, eliminating gender, geographical, sexual orientation, or socio-economic bias.
In this section we will deep into learning how blockchain data looks like in the Ethereum blockchain using the Etherscan explorer. The majority of features (or at least, similar ones) in the blockchain data from Ethereum are also present in other blockchains, but we focus on the Ethereum one since it is, together with Bitcoin, one of the two most popular and currently used blockchains for real world applications.
This section will be useful for us for when we retrieve blockchain data from APIs in Section 3; to understand and easily guess, for each of the features/variables retrieved, to what they are referring to.
Ethereum is a public blockchain with a "virtual" computer embedded in it. It is the foundation for building apps and organizations in a decentralized, permissionless, censorship-resistant way.
In the Ethereum universe, there is a single, canonical computer (called the Ethereum Virtual Machine, or EVM) whose state everyone on the Ethereum network agrees on. Everyone who participates in the Ethereum network (every Ethereum node) keeps a copy of the state of this computer. Additionally, any participant can broadcast a request for this computer to perform arbitrary computation. Whenever such a request is broadcast, other participants on the network verify, validate, and carry out ("execute") the computation. This execution causes a state change in the EVM, which is committed and propagated throughout the entire network.
Requests for computation are called transaction requests; the record of all transactions and the EVM's present state gets stored on the blockchain, which in turn is stored and agreed upon by all nodes.
Cryptographic mechanisms ensure that once transactions are verified as valid and added to the blockchain, they can't be tampered with later. The same mechanisms also ensure that all transactions are signed and executed with appropriate "permissions" (no one should be able to send digital assets from Alice's account, except for Alice herself, given that Alice is the only person in the world that knows his private key).
Ether (ETH) is the native cryptocurrency of Ethereum. The purpose of ETH is to allow for a market for computation. Such a market provides an economic incentive for participants to verify and execute transaction requests and provide computational resources to the network.
Any participant who broadcasts a transaction request must also offer some amount of ETH to the network as a bounty called Gas Fee. The network will award this bounty to whoever eventually does the work of verifying the transaction, executing it, committing it to the blockchain, and broadcasting it to the network.
Note: When we talk about "Ethereum" we are technically referring to a blockchain, the Ethereum blockchain. Usually people say "I have bought 'X' Ethereum" or "I have 'X' Ethereum", but they are referring to the fact that they "have bought" or "have in possession" 'X' units of ETH or Ether" in the Ethereum blockchain.
In practice, participants don't write new code every time they want to request a computation on the EVM. Rather, application developers upload programs (reusable snippets of code) into EVM state, and users make requests to execute these code snippets with varying parameters. We call the programs uploaded to and executed by the network smart contracts.
At a very basic level, you can think of a smart contract like a sort of vending machine: a script that, when called with certain parameters, performs some actions or computation if certain conditions are satisfied. For example, a simple vendor smart contract could create and assign ownership of a digital asset if the caller sends ETH to a specific recipient. In other words, this contract would allow users to buy tokens, NFTs, and the likes build on top of Ethereum paying with ETH.
Any developer can create a smart contract and make it public to the network, using the blockchain as its data layer, for a fee paid to the network. Any user can then call the smart contract to execute its code, again for a (Gas) fee paid to the network.
Thus, with smart contracts, developers can build and deploy arbitrarily complex user-facing apps and services such as: marketplaces, financial instruments, games, etc.
An Ethereum account is an entity with an ether (ETH) balance that can send transactions on Ethereum..
Accounts can be user-controlled or deployed as smart contracts:
Both account types have the ability to:
[Important] A wallet is an interface for managing an (EOA) Ethereum account as well as its public and private key. Given an Ethereum account, the public key is uniquely associated with an Ethereum (account) address: if the account is an EOA this address is referred to as wallet address (or EOA address), if the account is a contract account, we refer to it as contract address or smart contract address.
Additional resources:
An Ethereum transaction refers to an action initiated by an externally-owned account, as we already know, an account managed by a human, not a contract. For example, if Alice sends Bob 1 ETH, Alice's account must be debited and Bob's must be credited (this is an example of regular transaction). This state-changing action takes place within a transaction.
Transactions, which change the state of the EVM, need to be broadcast to the whole network. Any node can broadcast a request for a transaction to be executed on the EVM; after this happens, a validator will execute the transaction and propagate the resulting state change to the rest of the network. Transactions require a (Gas) fee and must be included in a validated block.
A submitted transaction includes the following information:
Gas is a reference to the computation required to process the transaction by a validator. It is paid in ETH but denominated in other units such as Wei and Gwei to be more measurable. Wei is used when describing smaller value transactions, and Gwei is commonly used when dealing with gas fees.
The transaction object will look a little like this:
{
from: "0xEA674fdDe714fd979de3EdF0F56AA9716B898ec8",
to: "0xac03bb73b6a9e108530aff4df5077c2b3d481e5a",
gasLimit: "21000",
maxFeePerGas: "300",
maxPriorityFeePerGas: "10",
nonce: "0",
value: "10000000000"
}
[Important] On Ethereum there are a few different types of transactions:
Additional resources:
Etherscan is a block explorer and analytics platform which lets you view and analyze assets, balances and transactions on the Ethereum network. You can also easily interact with smart contracts and check gas prices on Etherscan.
While it seems to share many functions of a crypto wallet, Etherscan is not a wallet service provider. It cannot store private keys, nor can it control outgoing or incoming transactions on the Ethereum blockchain. It only shows public data or on-chain data.
Additional resources:
Etherscan provides with all metadata an information regarding to the blocks that are being created in the Ethereum network.
There is a page to see a table with all the blocks that are being added to the blockchain. Please, read the etherscan's documentation about the blocks page to learn about which fields are shown about a block on that page (Block#, Age, Txn, Fee Recipient, Gas Used, Gas Limit, Base Fee, Reward, Burnt Fees (ETH)).

Then, one can see the metadata and details about a concrete block (lets say, Block #15818611 ]) in the blockchain visiting a "block details" page. Please, read the etherscan's documentation about the block details page to learn about each of the fields that define a particular block. A concrete example is shown below.

As we mentioned earlier in section 1, a type of digital assets that are built within a blockchain are the (crypto) tokens.
These tokens have a smart contract associated with them. In the Ethereum blockchain, a Token Smart Contract is the (Ethereum) smart contract representing the mechanism of creation and distribution of the Token.
Please, read the etherscan's documentation about the token pages to learn about each of the feature shown in a Token page.
Popular Token Example
Although it is what is known as a "shit" or "meme" coin (at least when the Token project started) due to its lack of value or utility, one of the most "hyped" tokens is SHIBA INU. Below we can see how its Token page looks like:

Additional resources:
Another (general) type of digital asset build within the blockchain are smart contracts. We have just seen a particular example of smart contract, the token smart contracts that define a token.
For instance, the smart contract address page for the SHIBA INU token is the following:

Smart contracts are developed in a programming language similar to JavaScript or C which is called Solidity. In the "Contract" tab, we can see the details and code of an smart contract.
Please, read the etherscan's documentation about the contract address page to learn about more details.
Overview

[Important] After this few introduction, please read the Etherscan's documentation on EOA address pages to learn about all the data that can be accessed/visualized for a particular wallet address.
Additional resources:
As we already know, transactions are the basic compounds that constitute a block in the blockchain. But how are transactions defined in terms of data (features)?
To answer this question, we will be exploring three basic examples of transactions that belong to one of the transaction types we saw in Section 2.1.4.
When viewing an address in Etherscan, there will be multiple tabs for the different type of transactions performed by that address:

A. Normal transactions
Transactions where some ETH is sent "From" a wallet address (EOA) directly "To" another EOA. When viewing an address on Etherscan, this type of transaction will be shown under the Transactions tab. Click here for a sample transaction:

In this transaction, an anonymous wallet address sends some ETH to a wallet address owned by Etherscan to receive donations (See the "To" and "From" fields). With the terminology of Section 2.1.4, this is an example of a Regular transaction.
Another example for these type of transactions would be when you buy something to someone else and you want to pay him/her with ETH.

B. Internal transactions
Transactions that involve a transfer of ETH that is carried out through a smart contract as an intermediary. When viewing an address on Etherscan, this type of transaction will be shown under the Internal Txns tab (in the Etherscan Address page). Click here for a sample transaction.

We see that the "To" feature is a Contract address, not a Wallet address as it was for a Normal transaction. Therefore, with the terminology of Section 2.1.4, this is an example of a Execution of a contract transaction.
From the Internal Txns tab (now in the Transaction details page) we can see that "From" a contract which is executed (called), 0.02415993 Ether are transferred "To" the Etherscan donations wallet address:

C. Token transfers
Transactions of ERC-20 or ERC-721 tokens are labelled as Token Transfer transactions. When viewing an address on Etherscan, this type of transaction will be shown under either the Erc20 Token Txns, Erc721 Token Txns, Erc1155 Token Txns (and so on) tabs, depending on the respective token type. Click here for a sample transaction.

We see that the "To" feature is a Contract address. Therefore, with the terminology of Section 2.1.4, this is another example of a Execution of a contract transaction, where the contract that is being executed is the contract of a Token (as we saw, the token contract is the provider of the mechanism of creation and distribution of the Token).
To get finished with transactions, please read the "What are the key areas of information covered?" section of that Etherscan's article where each of the fields that appear in the 'transaction details' page are explained. If you have time, I recommend you read the whole article.
In this section we will see how to get data from an external API that allows to retrieve data from the blockchain using Python.
The APIs that we will use are Etherscan's API, Bitquery (and its GraphQL IDE) and Coingecko.
Prerequisites
Keep in mind that API Keys are private, so you should not share it with anyone.
blockchain-tutorial-python.

In order to make an API request to some API endpoint (an specific url given by the API provider) with Python, we can use the requests module.
At the same file level as this notebook, there should be located a api_keys.json file where you should replace the "1234abcdf" value by your own API keys. At the time of writting this tutorial, you don't have to fill the "coingecko" api field in that JSON since you can make API requests to Coingecko without API key.
api_keys.json
{
"etherscan": "1234abcdf",
"bitquery": "1234abcdf",
"coingecko": "1234abcdf"
}
The first example we will see consists on retrieving the last "N" Normal transactions from a wallet or contract address
The goal is to retrieve all transactions from the Vitalik Buterin wallet only during the date 2022-10-26.

We see that there are 2 transactions where the VB is the "to" address, and 1 transaction where it is the "from" address of the transaction.
The transactions are ordered by date descendingly so that newest transactions are first. The transactions we want to retrieve are located between block #15828766 and block #15834718.
I have created the get_etherscan_post_request_json function to facilitate making POST requests to endpoints of the Etherscan API.
def get_etherscan_post_request_json(url_params_dict: typing.Dict[str, typing.Any],
api_key: typing.Optional[str],
verbose: bool = False) -> dict:
"""
Function to build Etherscan API URLs and performing a POST request given a set of parameters and a valid API key.
Example of URL:
url_params_dict = {
"module": "account",
"action": "txlist",
"address": "0xc5102fE9359FD9a28f877a67E36B0F050d81a3CC",
"startblock": 0,
"endblock": 99999999,
"page": 1,
"offset": 10,
"sort": "asc"
}
https://api.etherscan.io/api
?module=account
&action=txlist
&address=0xc5102fE9359FD9a28f877a67E36B0F050d81a3CC
&startblock=0
&endblock=99999999
&page=1
&offset=10
&sort=asc
&apikey={api_key}
"""
url = ""
base_url = "https://api.etherscan.io/api"
url += base_url
for i, item in enumerate(url_params_dict.items()):
key = item[0]
value = item[1]
if i == 0:
url += f"?{key}={value}"
else:
url += f"&{key}={value}"
url += f"&apikey={api_key}"
if verbose:
logging.info(f"Requested url: {url}")
request = requests.post(url)
if request.status_code == 200:
return request.json()
else:
raise Exception(
f"Post request failed for url '{url}'. Request status code: {request.status_code}"
)
The dictionary of parameters to use within a call to the get_etherscan_post_request_json function to obtain the transactions at 2022-10-27 are the following:
url_params_dict = {
"module": "account",
"action": "txlist",
"address": "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B", # Vitalik Buterin wallet address
"startblock": 15828766, # Block to start searching transactions from
"endblock": 15834718, # Block to stop searching transactions at
"page": 1,
"offset": 10, # get at most 10 transactions (we expect to be just 3)
"sort": "desc", # or asc
}
response_json = get_etherscan_post_request_json(url_params_dict, ETHERSCAN_API_KEY,True)
logging.info(f"Response JSON keys: {response_json.keys()}")
logging.info(f"status: {response_json['status']}")
logging.info(f"message: {response_json['message']}")
2022-11-02 13:34:55 [INFO] root - Requested url: https://api.etherscan.io/api?module=account&action=txlist&address=0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B&startblock=15828766&endblock=15834718&page=1&offset=10&sort=desc&apikey=MCXN2GXN8WV24D8PIN8A1264VAI2AN37J7 2022-11-02 13:34:56 [INFO] root - Response JSON keys: dict_keys(['status', 'message', 'result']) 2022-11-02 13:34:56 [INFO] root - status: 1 2022-11-02 13:34:56 [INFO] root - message: OK
Transform the list of transactions located in the "result" field of the API response JSON to a Pandas DataFrame using the json_normalize method:
results = response_json["result"]
results_df = pd.json_normalize(results)
logging.info(f"Dataset Features: {list(results_df.columns)}")
results_df.head()
2022-11-02 13:34:56 [INFO] root - Dataset Features: ['blockNumber', 'timeStamp', 'hash', 'nonce', 'blockHash', 'transactionIndex', 'from', 'to', 'value', 'gas', 'gasPrice', 'isError', 'txreceipt_status', 'input', 'contractAddress', 'cumulativeGasUsed', 'gasUsed', 'confirmations', 'methodId', 'functionName']
| blockNumber | timeStamp | hash | nonce | blockHash | transactionIndex | from | to | value | gas | gasPrice | isError | txreceipt_status | input | contractAddress | cumulativeGasUsed | gasUsed | confirmations | methodId | functionName | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 15834718 | 1666817063 | 0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e | 1317 | 0x79ea7326cd8d180d925e398756880445f060a4eefc9fb972cfda0304c212b492 | 113 | 0x1eaf03af983d1e20b1bab73964906c281e141d4f | 0xab5801a7d398351b8be11c439e05c5b3259aec9b | 0 | 32868 | 17364769372 | 0 | 1 | 0x68747470733a2f2f747769747465722e636f6d2f56425f546f6b656e2f7374617475732f313538353337303332323436363835323836350a0a | 15593267 | 21912 | 47658 | 0x68747470 | ||
| 1 | 15833688 | 1666804655 | 0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245 | 1315 | 0xcd4c6b8f9726649a85dd5bdaff6f2fc6c3774dd01e2bde3a481c5254b4e2e6bb | 108 | 0x1eaf03af983d1e20b1bab73964906c281e141d4f | 0xab5801a7d398351b8be11c439e05c5b3259aec9b | 0 | 40572 | 31584089397 | 0 | 1 | 0x57652077696c6c2072697365207468652024564220746f6b656e20696e20796f757220686f6e6f75722c2042657374206e61727261746976652073696e636520534849420a0a4f776e65722064756d706564206f6e2074686520636f6d6d756e69747920616e6420636c6f736573207468652074656c656772616d2c62656361757365206f6620746865204c50206265656e2031303025206275726e656420616e642074686520636f6e74726163742072656e6f756e6365642c2074686520636f6d6d756e69747920726f7365206974206261636b2066726f6d2064656174682062656361757365206f66207768617420796f7520676976652075732064617920696e20616e6420646179206f75742077697468207468652045544820626c6f636b636861696e200a0a7468616e6b796f752c20796f75206861766520737570706c7920746f20646f207768617420796f75207769736820776974682e0a0a68747470733a2f2f742e6d652f5642636f6d6d756e697479455243 | 8676810 | 27048 | 48688 | 0x57652077 | ||
| 2 | 15828766 | 1666745363 | 0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86 | 209 | 0xaa5a81e87fa41e48fbede4ad5c8b6902c26ca720e2d4ee07107fb0e4b9918b2e | 106 | 0xab5801a7d398351b8be11c439e05c5b3259aec9b | 0xc79b96044906550a5652bcf20a6ea02f139b9ae5 | 3923686000000000000 | 21000 | 12383671336 | 0 | 1 | 0x | 14001536 | 21000 | 53610 | 0x |
Lets see the types of the columns in the dataframe and convert to numeric variables if necessary
results_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 blockNumber 3 non-null object 1 timeStamp 3 non-null object 2 hash 3 non-null object 3 nonce 3 non-null object 4 blockHash 3 non-null object 5 transactionIndex 3 non-null object 6 from 3 non-null object 7 to 3 non-null object 8 value 3 non-null object 9 gas 3 non-null object 10 gasPrice 3 non-null object 11 isError 3 non-null object 12 txreceipt_status 3 non-null object 13 input 3 non-null object 14 contractAddress 3 non-null object 15 cumulativeGasUsed 3 non-null object 16 gasUsed 3 non-null object 17 confirmations 3 non-null object 18 methodId 3 non-null object 19 functionName 3 non-null object dtypes: object(20) memory usage: 608.0+ bytes
relevant_integer_columns = ["nonce" ,"transactionIndex", "value", "gas", "gasPrice", "gasUsed", "cumulativeGasUsed", "confirmations"]
results_df.loc[:,relevant_integer_columns] = results_df.loc[:,relevant_integer_columns].astype(float)
# we convert to float since "value" is too large for int type
Create a date (In UTC, as displayed in the Etherscan tables) column from the Timestamp column
results_df.loc[:, "date"] = results_df["timeStamp"].astype(int).apply(datetime.utcfromtimestamp)
main_relevant_columns = ["date", "blockNumber", "hash", "from", "to", "value", "gas", "gasUsed", "gasPrice" ]
results_df_relevant = results_df[main_relevant_columns]
results_df_relevant.head()
| date | blockNumber | hash | from | to | value | gas | gasUsed | gasPrice | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-10-26 20:44:23 | 15834718 | 0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e | 0x1eaf03af983d1e20b1bab73964906c281e141d4f | 0xab5801a7d398351b8be11c439e05c5b3259aec9b | 0.0 | 32868.0 | 21912.0 | 17364769372.0 |
| 1 | 2022-10-26 17:17:35 | 15833688 | 0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245 | 0x1eaf03af983d1e20b1bab73964906c281e141d4f | 0xab5801a7d398351b8be11c439e05c5b3259aec9b | 0.0 | 40572.0 | 27048.0 | 31584089397.0 |
| 2 | 2022-10-26 00:49:23 | 15828766 | 0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86 | 0xab5801a7d398351b8be11c439e05c5b3259aec9b | 0xc79b96044906550a5652bcf20a6ea02f139b9ae5 | 3923686000000000000.0 | 21000.0 | 21000.0 | 12383671336.0 |
To finish with this example, lets try to obtain the most similar table to the one shown in Etherscan
df_etherscan = results_df_relevant.replace("0xab5801a7d398351b8be11c439e05c5b3259aec9b", "Vitalik Buterin")
# Convert "value" from WEI to ETH and gasPrice from WEI to GWEI
df_etherscan["value (ETH)"] = df_etherscan["value"] * 10**(-18)
df_etherscan["gasPrice (GWEI)"] = df_etherscan["gasPrice"] * 10**(-9)
# Txn Fee = Gas Used * Gas Price (and multiply by 10**(-9) since gasPrice is now in GWEI and we want Txn Fee in ETH)
df_etherscan["Txn Fee (ETH)"] = df_etherscan["gasUsed"] * df_etherscan["gasPrice (GWEI)"] * 10**(-9)
ETHERSCAN_TRANSACTIONS_TABLE_COLUMNS = ["hash", "date", "from", "to", "value (ETH)", "Txn Fee (ETH)", "gasPrice (GWEI)"]
df_etherscan[ETHERSCAN_TRANSACTIONS_TABLE_COLUMNS]
| hash | date | from | to | value (ETH) | Txn Fee (ETH) | gasPrice (GWEI) | |
|---|---|---|---|---|---|---|---|
| 0 | 0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e | 2022-10-26 20:44:23 | 0x1eaf03af983d1e20b1bab73964906c281e141d4f | Vitalik Buterin | 0.000000 | 0.000380 | 17.364769 |
| 1 | 0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245 | 2022-10-26 17:17:35 | 0x1eaf03af983d1e20b1bab73964906c281e141d4f | Vitalik Buterin | 0.000000 | 0.000854 | 31.584089 |
| 2 | 0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86 | 2022-10-26 00:49:23 | Vitalik Buterin | 0xc79b96044906550a5652bcf20a6ea02f139b9ae5 | 3.923686 | 0.000260 | 12.383671 |
You can check the results by comparing it with the Etherscan table.
Now we will try to extract the same transactions data we just obtained using the Etherscan's API but now using another API, the Bitquery API.
Bitquery has an interactive IDE to build GraphQL-like queries that allow to retreive blockhain data.
Exercise: Go to the Bitquery IDE and try to build a query to retrieve transactions where the "To" field is the Vitalik Buterin (0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B) address and only for 2022-10-26. Similarly changing just one line of the query, you should be able to retrieve the transactions where the "From" field is the Vitalik Buterin address. At the time of writting this tutorial, I have not found a way to retreive transactions that have either the "To" field or the "From" field with a particular address as value using a SINGLE query (If you find out, please contact me ;). There is a question in the Bitquery forum asking for the same). Try to add to the query the Bitquery fields that you think will return the same values as the variables ["hash", "date", "from", "to", "value (ETH)", "Txn Fee (ETH)", "gasPrice (GWEI)"] we just saw in the last section using Etherscan.

The solution of the first query can be seen in the figure below. (Note that in the final query we are using we have included the block timestamp not the date as was shown in the figure above, since block timestamp variable provides us not only with the date but also with the time. date variable misses the time - hour, minut, seconds - info). (Note: the currency field was included just to check that what is transferred is ETH).

To sum up, the query to retrieve transactions where "To" is the Vitalik Buterin address is :
gql
query TransactionsToVB26October2022 {
ethereum {
transactions(
txTo: {is: "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B"}
options: {desc: "block.timestamp.time"}
) {
gasPrice
gasValue
hash
to {
address
}
sender {
address
}
gas
currency {
address
name
symbol
}
amount
block(height: {gteq: 15828766, lteq: 15834718}) {
height
timestamp{
time(format: "%Y-%m-%d %H:%M:%S")
}
}
}
}
}
and the query for transactions where "From" is the Vitalik Buterin address would be the same but with the following change:
txTo: {is: "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B"} -> txSender: {is: "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B"}
Lets create a function that generalizes the query above with some parameters:
def get_transactions_query(address: str = "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B",
start_block: int = 15828766,
end_block: int = 15834718,
transactions_to_address: bool = True) -> str:
"""
Given an <address> (contract or wallet address), a <start_block> block number and a <end_block> block
number.
Return:
- If <transactions_to_address> is True, a query to retrieve transactions that belong to blocks with
block number between <start_block> and <end_block>, and so that the "To" address is <address>.
- If <transactions_to_address> is False, the same query as when True but so that the "From" address
(the sender) is <address>.
"""
address_direction_field = "txTo" if transactions_to_address else "txSender"
date_format = "%Y-%m-%d %H:%M:%S"
query_str = """
query Transactions {
ethereum {
transactions(
%s: {is: "%s"}
options: {desc: "block.timestamp.time"}
) {
gasPrice
gasValue
hash
to {
address
}
sender {
address
}
gas
currency {
address
name
symbol
}
amount
block(height: {gteq: %d, lteq: %d}) {
height
timestamp{
time(format: "%s")
}
}
}
}
}
""" % (address_direction_field, address, start_block, end_block, date_format)
return query_str
Check for the first query:
transactions_to_vb_query = get_transactions_query(transactions_to_address = True)
print(transactions_to_vb_query)
query Transactions {
ethereum {
transactions(
txTo: {is: "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B"}
options: {desc: "block.timestamp.time"}
) {
gasPrice
gasValue
hash
to {
address
}
sender {
address
}
gas
currency {
address
name
symbol
}
amount
block(height: {gteq: 15828766, lteq: 15834718}) {
height
timestamp{
time(format: "%Y-%m-%d %H:%M:%S")
}
}
}
}
}
Check for the second query:
transactions_from_vb_query = get_transactions_query(transactions_to_address = False)
print(transactions_from_vb_query)
query Transactions {
ethereum {
transactions(
txSender: {is: "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B"}
options: {desc: "block.timestamp.time"}
) {
gasPrice
gasValue
hash
to {
address
}
sender {
address
}
gas
currency {
address
name
symbol
}
amount
block(height: {gteq: 15828766, lteq: 15834718}) {
height
timestamp{
time(format: "%Y-%m-%d %H:%M:%S")
}
}
}
}
}
Once we have the queries we want to use to retrieve data, we need a function to do a POST request to the Bitquery API.
For Etherscan, we put the apikey as a parameter of the URL we were posting to, whereas now for bitquery we need to include the API key as a parameter in the Request header as you can see in the run_query function below. Also, the query string is sent as a JSON in the json parameter of the requests.post method.
def run_query(
query: str, api_key: str
) -> typing.Optional[
typing.Dict[str, typing.Any]
]:
"""
A simple function to use requests.post to make the API call to Bitquery API.
"""
headers = {"X-API-KEY": api_key}
request = requests.post(
"https://graphql.bitquery.io/",
json={"query": query},
headers=headers,
)
if request.status_code == 200:
return request.json() # could be None
else:
return {"error": {"status_code": request.status_code}}
Call the run_query function to the two queries that retrieve the transactions we want:
transactions_to_vb_json = run_query(transactions_to_vb_query, BITQUERY_API_KEY)
transactions_from_vb_json = run_query(transactions_from_vb_query, BITQUERY_API_KEY)
Explore how the json responses look like:
transactions_to_vb_json
{'data': {'ethereum': {'transactions': [{'gasPrice': 17.364768768,
'gasValue': 0.000380496823451648,
'hash': '0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e',
'to': {'address': '0xab5801a7d398351b8be11c439e05c5b3259aec9b'},
'sender': {'address': '0x1eaf03af983d1e20b1bab73964906c281e141d4f'},
'gas': 21912.0,
'currency': {'address': '-', 'name': 'Ether', 'symbol': 'ETH'},
'amount': 0.0,
'block': {'height': 15834718,
'timestamp': {'time': '2022-10-26 20:44:23'}}},
{'gasPrice': 31.584090112,
'gasValue': 0.00085428644347904,
'hash': '0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245',
'to': {'address': '0xab5801a7d398351b8be11c439e05c5b3259aec9b'},
'sender': {'address': '0x1eaf03af983d1e20b1bab73964906c281e141d4f'},
'gas': 27048.0,
'currency': {'address': '-', 'name': 'Ether', 'symbol': 'ETH'},
'amount': 0.0,
'block': {'height': 15833688,
'timestamp': {'time': '2022-10-26 17:17:35'}}}]}}}
We see that the list of transactions are stored in a "third key level": "data" -> "ethereum" -> "transactions".
Retrieve the list of transactions data from the json (python dict) and join them to have the list of all transactions:
transactions_to_list = transactions_to_vb_json["data"]["ethereum"]["transactions"]
transactions_from_list = transactions_from_vb_json["data"]["ethereum"]["transactions"]
transactions_to_and_from = transactions_to_list + transactions_from_list
print("List of all transactions:\n")
transactions_to_and_from
List of all transactions:
[{'gasPrice': 17.364768768,
'gasValue': 0.000380496823451648,
'hash': '0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e',
'to': {'address': '0xab5801a7d398351b8be11c439e05c5b3259aec9b'},
'sender': {'address': '0x1eaf03af983d1e20b1bab73964906c281e141d4f'},
'gas': 21912.0,
'currency': {'address': '-', 'name': 'Ether', 'symbol': 'ETH'},
'amount': 0.0,
'block': {'height': 15834718, 'timestamp': {'time': '2022-10-26 20:44:23'}}},
{'gasPrice': 31.584090112,
'gasValue': 0.00085428644347904,
'hash': '0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245',
'to': {'address': '0xab5801a7d398351b8be11c439e05c5b3259aec9b'},
'sender': {'address': '0x1eaf03af983d1e20b1bab73964906c281e141d4f'},
'gas': 27048.0,
'currency': {'address': '-', 'name': 'Ether', 'symbol': 'ETH'},
'amount': 0.0,
'block': {'height': 15833688, 'timestamp': {'time': '2022-10-26 17:17:35'}}},
{'gasPrice': 12.383671296,
'gasValue': 0.000260057098878976,
'hash': '0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86',
'to': {'address': '0xc79b96044906550a5652bcf20a6ea02f139b9ae5'},
'sender': {'address': '0xab5801a7d398351b8be11c439e05c5b3259aec9b'},
'gas': 21000.0,
'currency': {'address': '-', 'name': 'Ether', 'symbol': 'ETH'},
'amount': 3.923686,
'block': {'height': 15828766, 'timestamp': {'time': '2022-10-26 00:49:23'}}}]
Convert the list of transactions to a pandas DataFrame using the json_normalize method:
raw_bitquery_df = pd.json_normalize(transactions_to_and_from)
raw_bitquery_df
| gasPrice | gasValue | hash | gas | amount | to.address | sender.address | currency.address | currency.name | currency.symbol | block.height | block.timestamp.time | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 17.364769 | 0.000380 | 0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e | 21912.0 | 0.000000 | 0xab5801a7d398351b8be11c439e05c5b3259aec9b | 0x1eaf03af983d1e20b1bab73964906c281e141d4f | - | Ether | ETH | 15834718 | 2022-10-26 20:44:23 |
| 1 | 31.584090 | 0.000854 | 0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245 | 27048.0 | 0.000000 | 0xab5801a7d398351b8be11c439e05c5b3259aec9b | 0x1eaf03af983d1e20b1bab73964906c281e141d4f | - | Ether | ETH | 15833688 | 2022-10-26 17:17:35 |
| 2 | 12.383671 | 0.000260 | 0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86 | 21000.0 | 3.923686 | 0xc79b96044906550a5652bcf20a6ea02f139b9ae5 | 0xab5801a7d398351b8be11c439e05c5b3259aec9b | - | Ether | ETH | 15828766 | 2022-10-26 00:49:23 |
Lets now rename the columns so that we can compare with the results we obtained using the Etherscan API.
Note that:
gasPrice is yet in GWEIgasValue is in ETH and is the feature that we called Txn Fee (ETH) amount is yet in ETH toobitquery_table_for_etherscan_comparison = raw_bitquery_df.rename(columns = {
"gasPrice": "gasPrice (GWEI)",
"gasValue": "Txn Fee (ETH)",
"amount": "value (ETH)",
"to.address": "to",
"sender.address": "from",
"block.height": "blockNumber",
"block.timestamp.time": "date"
})[ETHERSCAN_TRANSACTIONS_TABLE_COLUMNS].replace("0xab5801a7d398351b8be11c439e05c5b3259aec9b", "Vitalik Buterin")
# Table obtained using the Bitquery API to compare with the results obtained with Etherescan
bitquery_table_for_etherscan_comparison
| hash | date | from | to | value (ETH) | Txn Fee (ETH) | gasPrice (GWEI) | |
|---|---|---|---|---|---|---|---|
| 0 | 0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e | 2022-10-26 20:44:23 | 0x1eaf03af983d1e20b1bab73964906c281e141d4f | Vitalik Buterin | 0.000000 | 0.000380 | 17.364769 |
| 1 | 0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245 | 2022-10-26 17:17:35 | 0x1eaf03af983d1e20b1bab73964906c281e141d4f | Vitalik Buterin | 0.000000 | 0.000854 | 31.584090 |
| 2 | 0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86 | 2022-10-26 00:49:23 | Vitalik Buterin | 0xc79b96044906550a5652bcf20a6ea02f139b9ae5 | 3.923686 | 0.000260 | 12.383671 |
Comparing the results from both tables, we see that they are exactly the same except for a very small rounding error difference in the gasPrice (GWEI) column (31.584090 v.s. 31.584089 value).
# Table obtained before using Etherscan
df_etherscan[ETHERSCAN_TRANSACTIONS_TABLE_COLUMNS]
| hash | date | from | to | value (ETH) | Txn Fee (ETH) | gasPrice (GWEI) | |
|---|---|---|---|---|---|---|---|
| 0 | 0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e | 2022-10-26 20:44:23 | 0x1eaf03af983d1e20b1bab73964906c281e141d4f | Vitalik Buterin | 0.000000 | 0.000380 | 17.364769 |
| 1 | 0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245 | 2022-10-26 17:17:35 | 0x1eaf03af983d1e20b1bab73964906c281e141d4f | Vitalik Buterin | 0.000000 | 0.000854 | 31.584089 |
| 2 | 0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86 | 2022-10-26 00:49:23 | Vitalik Buterin | 0xc79b96044906550a5652bcf20a6ea02f139b9ae5 | 3.923686 | 0.000260 | 12.383671 |
del df_etherscan, bitquery_table_for_etherscan_comparison, raw_bitquery_df
Finally we are going to see how to use the Coingecko API but now in a quite different and perhaps easy way.
For some APIs there are open source providers that offer a client for specific programming languages that allow to retrieve data from the API in an specific language easier. In the case of Coingecko, there is an unofficial Python module developed by the community pycoingecko.
To retrieve information about coins in Coingecko, it is usually used a "coin id" that Coingecko has assigned to each coin or token.
Lets retrieve the list of all coins that are registered in the Coingecko API:
import pycoingecko
client = pycoingecko.CoinGeckoAPI()
ALL_COINS = client.get_coins_list(include_platform=False)
# include_platform = True gives the address of the coins in the available platforms
# Remember that Ethereum is not the only blockchain, there are many others such as Bitcoin, and some coins are not in Enthereum
# but they are in other blockchains.
logging.info(f"There are a total of {len(ALL_COINS)}coins (or tokens) listed in Coingecko.")
2022-11-02 13:34:58 [INFO] root - There are a total of 13255coins (or tokens) listed in Coingecko.
all_coins_df = pd.json_normalize(ALL_COINS)
all_coins_df.head(5)
| id | symbol | name | |
|---|---|---|---|
| 0 | 01coin | zoc | 01coin |
| 1 | 0-5x-long-algorand-token | algohalf | 0.5X Long Algorand |
| 2 | 0-5x-long-altcoin-index-token | althalf | 0.5X Long Altcoin Index |
| 3 | 0-5x-long-ascendex-token-token | asdhalf | 0.5X Long AscendEx |
| 4 | 0-5x-long-bitcoin-cash-token | bchhalf | 0.5X Long Bitcoin Cash |
We are going to obtain data for just some of the most popular cryptocurrencies: Ethereum, Bitcoin, Cardano, Solana, ROSE (Oasis Network) and the Shiba Inu token.
# Subset the coins we will retrieve data about
coins_to_study_df = all_coins_df.query('id in ["cardano", "ethereum", "oasis-network", "solana", "bitcoin", "shiba-inu"]')
coins_to_study_df
| id | symbol | name | |
|---|---|---|---|
| 1629 | bitcoin | btc | Bitcoin |
| 2256 | cardano | ada | Cardano |
| 4214 | ethereum | eth | Ethereum |
| 8393 | oasis-network | rose | Oasis Network |
| 10351 | shiba-inu | shib | Shiba Inu |
| 10676 | solana | sol | Solana |
coins_to_study_ids = coins_to_study_df.id.tolist()
coins_to_study_ids
['bitcoin', 'cardano', 'ethereum', 'oasis-network', 'shiba-inu', 'solana']
Now we are going to use a Python class that I have programmed to handle the retrieval of Historical coin related data from Coingecko, using the pycoingecko client, a coin_id, a range of dates [start_date, end_date] and a currency vs_currency to be used as unit of measurement.
VSCURR = typing.Literal["usd", "eur", "gbp", "jpy"] # dollar, euro, pound and japanese yen
class HistoricalCoinPriceRetriever():
coin_id: str
vs_currency: VSCURR
start_date: datetime
end_date: datetime
all_coins_list: typing.List[typing.Any] # client.get_coins_list(include_platform=True)
def __init__(
self,
coin_id: str,
start_date: datetime,
end_date: datetime,
all_coins_list: typing.List[typing.Any],
vs_currency: VSCURR = "usd",
):
super().__init__()
self.coin_id = coin_id
self.vs_currency = vs_currency
self.start_date = start_date
self.end_date = end_date
all_coins = all_coins_list
candidates = [v for v in all_coins if v["id"] == coin_id]
if len(candidates) == 0:
logging.error(f"Coin ID: {self.coin_id}")
logging.error(f"coins list len: {len(all_coins)}")
logging.error(
"Coin ID is not initialized well, please review the model card parameters !!!"
)
raise ValueError("Invalid Coin ID")
elif len(candidates) > 1:
logging.warning(
f"There are {len(candidates)} coins with same id {self.coin_id}: {candidates}"
)
if self.end_date.timestamp() < self.start_date.timestamp():
logging.error(f"end_date must be greater than start_date")
logging.error(
f"end_date: {datetime.datetime.fromtimestamp(self.end_date.timestamp,pytz.utc).isoformat()}"
)
logging.error(
f"start_date: {datetime.datetime.fromtimestamp(self.start_date.timestamp,pytz.utc).isoformat()}"
)
raise ValueError("end_date must be greater than start_date")
def transform(
self
) -> pd.DataFrame:
start_date = str(int(self.start_date.timestamp()))
end_date = str(int(self.end_date.timestamp()))
client = pycoingecko.CoinGeckoAPI()
results_data = client.get_coin_market_chart_range_by_id(
self.coin_id,
self.vs_currency,
start_date,
end_date, # prices, market_cap, total_volumes
)
prices_df = pd.DataFrame(
results_data["prices"], columns=[
"datetime", f"price_in_{self.vs_currency}"]
)
market_cap_df = pd.DataFrame(
results_data["market_caps"],
columns=["datetime1", f"market_cap_in_{self.vs_currency}"],
)
volumes_df = pd.DataFrame(
results_data["total_volumes"],
columns=["datetime2", f"24h_volume_in_{self.vs_currency}"],
)
stats_df = pd.concat([prices_df, market_cap_df, volumes_df], axis=1)
stats_df = stats_df[
[
"datetime",
f"price_in_{self.vs_currency}",
f"market_cap_in_{self.vs_currency}",
f"24h_volume_in_{self.vs_currency}",
]
]
stats_df["datetime"] = pd.to_datetime(stats_df["datetime"], unit="ms")
stats_df["coin_id"] = self.coin_id
return stats_df
For this tutorial, lets retrieve data from January 1st, 2021 to October 1st, 2022 and in USD:
# You can change the values of the three variables below if you want to experiment with different values.
START_DATE_STR = "2021-01-01"
END_DATE_STR = "2022-10-01"
VS_CURRENCY = "usd"
# Do not change code below
START_DATETIME = datetime.strptime(START_DATE_STR, '%Y-%m-%d')
END_DATETIME = datetime.strptime(END_DATE_STR, '%Y-%m-%d')
historical_coin_info_df = None
for i, coin_id in enumerate(coins_to_study_ids):
logging.info(f"Retrieving historical coin info for coin with id '{coin_id}'...")
retriever = HistoricalCoinPriceRetriever(coin_id, START_DATETIME, END_DATETIME, ALL_COINS, VS_CURRENCY)
if i == 0:
historical_coin_info_df = retriever.transform()
else:
historical_coin_info_df_i = retriever.transform()
historical_coin_info_df = pd.concat([historical_coin_info_df, historical_coin_info_df_i] , axis = 0)
2022-11-02 13:34:59 [INFO] root - Retrieving historical coin info for coin with id 'bitcoin'... 2022-11-02 13:34:59 [INFO] root - Retrieving historical coin info for coin with id 'cardano'... 2022-11-02 13:34:59 [INFO] root - Retrieving historical coin info for coin with id 'ethereum'... 2022-11-02 13:34:59 [INFO] root - Retrieving historical coin info for coin with id 'oasis-network'... 2022-11-02 13:35:00 [INFO] root - Retrieving historical coin info for coin with id 'shiba-inu'... 2022-11-02 13:35:00 [INFO] root - Retrieving historical coin info for coin with id 'solana'...
historical_coin_info_df.head()
| datetime | price_in_usd | market_cap_in_usd | 24h_volume_in_usd | coin_id | |
|---|---|---|---|---|---|
| 0 | 2021-01-01 | 29022.418395 | 5.394380e+11 | 4.350352e+10 | bitcoin |
| 1 | 2021-01-02 | 29352.126792 | 5.455933e+11 | 3.408972e+10 | bitcoin |
| 2 | 2021-01-03 | 32163.824935 | 5.978877e+11 | 5.727344e+10 | bitcoin |
| 3 | 2021-01-04 | 33008.226203 | 6.136169e+11 | 1.788941e+11 | bitcoin |
| 4 | 2021-01-05 | 31515.575967 | 5.857263e+11 | 7.465717e+10 | bitcoin |
In the following plot, you can see the coin prices in "VS_CURRENCY" units.
Since the values of different coins are at different scales, to see each of them individually, you can double click one of the coin_id values in the legend of the plot. Similarly, we show the plots for the coin market cap and the 24h volume.
fig = px.line(historical_coin_info_df, x="datetime", y=f"price_in_{VS_CURRENCY}", color = "coin_id",
title = f"Coin prices in '{VS_CURRENCY}' from {START_DATE_STR} to {END_DATE_STR}"
)
fig.show()
# save_html_plot(fig, os.path.join(OUTPUTS_PATH, f"coin_prices_in_{VS_CURRENCY}_from_{START_DATE_STR}_to{END_DATE_STR}.html"))
fig = px.line(historical_coin_info_df, x="datetime", y=f"market_cap_in_{VS_CURRENCY}", color = "coin_id",
title = f"Market cap in '{VS_CURRENCY}' from {START_DATE_STR} to {END_DATE_STR}"
)
fig.show()
# save_html_plot(fig, os.path.join(OUTPUTS_PATH, f"coin_marketcaps_in_{VS_CURRENCY}_from_{START_DATE_STR}_to{END_DATE_STR}.html"))
fig = px.line(historical_coin_info_df, x="datetime", y=f"24h_volume_in_{VS_CURRENCY}", color = "coin_id",
title = f"24h Volume in '{VS_CURRENCY}' from {START_DATE_STR} to {END_DATE_STR}"
)
fig.show()
# save_html_plot(fig, os.path.join(OUTPUTS_PATH, f"coin_24hvolumes_in_{VS_CURRENCY}_from_{START_DATE_STR}_to{END_DATE_STR}.html"))
In this usecase example we retrieve DEX transactions data to see which are the most popular sold or bought tokens for a particular period of time. The code developed to solve this usecase is found in the src/usecases/DEX_trades module.
Background
USECASE_PATH = "src/usecases/DEX_trades"
from src.usecases.DEX_trades.dex_trades_retriever import DEXTradesRetriever
display(Markdown(open(os.path.join(USECASE_PATH, "README.md"), encoding='utf-8').read()))
DEXTradesRetriever class allows to retrieve DEX (swap) transactions (from the ethereum network) data for a given list of protocols or for all available protocols, and between a range of dates. Additionally, an extra output DataFrame with the count of dex trades per protocol is returned in the ProtocolDexTrCounts output DataFrame.
At least, the following fields are obtained in the DEX_Trades output DataFrame:
transaction_from_address: usually the wallet address from which the swap transaction has been performed.datetime: date of the transaction (datetime format).protocol: transaction swap protocol (ex: Uniswap v3)total_trade_amount_usd: amount of USD that quantifies the transaction.sold_token_symbol: symbol of the token that is sold.sold_tokens: number of sold tokens (ex: 20 SOLANA)bought_tokens: number of bought tokens (ex: 1 WETH).bought_token_symbol: symbol of the token that is bought.transaction_gas_value: amount of gas (in ETH) wasted in the transaction (transaction fee).bought_token_address: contract address of the bought token.sold_token_address: contract address of the sold token.transaction_hash: hash of the transaction.For the ProtocolDexTrCounts output DataFrame, the columns are just two:
protocol: protocol name.count: number of dex transactions, in the specified range of dates, that have been done via the given protocol.Note: whether or not wallet_addresses parameter is specified, the ProtocolDexTrCounts dataframe will contain the counts over all DEX transactions, not only transactions for the specified address, for the specified date range.
Tip: To convert the column transaction_gas_value from ETH to USD in order to know the value in USD of the gas at the time of the transactions, it is necessary to know the price of ETH in USD at the time of the transaction.
Warning: When the protocols parameter is not specified (i.e., left empty), all available protocols at the time of running the transform method of the retriever will be considered. For this case, keep in mind that the running time can be huge, although specifying just a cuple of days via the start_date and end_date parameters.
# You can change the values of the three variables below if you want to experiment with different values.
START_DATE_STR = "2022-10-31"
END_DATE_STR = "2022-11-01"
# Do not change code below
START_DATETIME = datetime.strptime(START_DATE_STR, '%Y-%m-%d')
END_DATETIME = datetime.strptime(END_DATE_STR, '%Y-%m-%d')
trades_retriever = DEXTradesRetriever(api_key=BITQUERY_API_KEY, start_date=START_DATETIME, end_date=END_DATETIME)
usecase_outputs_path = os.path.join(OUTPUTS_PATH, "DEX_trades")
dex_trades_file = f"dex_trades_{START_DATE_STR}_{END_DATE_STR}.csv"
protocols_file = f"protocols_count_{START_DATE_STR}_{END_DATE_STR}.csv"
dex_trades_path = f"{usecase_outputs_path}/{dex_trades_file}"
protocols_path = f"{usecase_outputs_path}/{protocols_file}"
if os.path.exists(dex_trades_path) and os.path.exists(protocols_path):
logging.warn(f"Existing data for the time range [{START_DATETIME},{END_DATETIME}]. Reading DEX trades and protocols dataframes from file.")
protocols_df = pd.read_csv(protocols_path)
trades_df = pd.read_csv(dex_trades_path)
else:
halloween_trades_dict = trades_retriever.transform() # ~45 min running time for 5 days. ~ 15min for 2 days
protocols_df = halloween_trades_dict["ProtocolDexTrCounts"]
trades_df = halloween_trades_dict["DEX_Trades"]
protocols_df.to_csv(protocols_path, index = False)
trades_df.to_csv(dex_trades_path, index = False)
2022-11-02 13:35:47 [WARNING] root - Existing data for the time range [2022-10-31 00:00:00,2022-11-01 00:00:00]. Reading DEX trades and protocols dataframes from file.
C:\Users\daniel.salgado\AppData\Local\Temp\ipykernel_22932\836146019.py:2: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
trades_df.shape
(332261, 12)
protocos_df = protocols_df.sort_values(by = "count", ascending = False)
# Top 10 protocols
protocos_df.head(10)
| protocol | count | |
|---|---|---|
| 30 | Uniswap v2 | 256214 |
| 31 | Uniswap v3 | 88975 |
| 4 | Balancer v2 | 4096 |
| 33 | Zerox Exchange v2 | 1453 |
| 35 | Zerox Exchange v4 | 1387 |
| 3 | Balancer Pool Token | 986 |
| 6 | Bancor Network v2 | 858 |
| 8 | Curve | 359 |
| 12 | Dodo | 299 |
| 29 | Uniswap | 197 |
sns.barplot(data=protocos_df.head(10), x = "count", y="protocol");
sold_tokens_counts = trades_df["sold_token_symbol"].value_counts().reset_index(name="counts").rename(columns={"index": "token_symbol"})
sold_tokens_counts.head(15)
| token_symbol | counts | |
|---|---|---|
| 0 | WETH | 125815 |
| 1 | USDC | 28865 |
| 2 | USDT | 12461 |
| 3 | DAI | 3911 |
| 4 | Vine | 3868 |
| 5 | SHIB | 2433 |
| 6 | WBTC | 2419 |
| 7 | LMEOW | 2296 |
| 8 | USHI | 2104 |
| 9 | BUSHI | 1967 |
| 10 | Error in symbol | 1913 |
| 11 | TITTER | 1810 |
| 12 | TAMA | 1702 |
| 13 | HEX | 1553 |
| 14 | $TRUMP | 1540 |
sns.barplot(data=sold_tokens_counts.head(15), x = "counts", y="token_symbol");
bought_tokens_counts = trades_df["bought_token_symbol"].value_counts().reset_index(name="counts").rename(columns={"index": "token_symbol"})
bought_tokens_counts.head(15)
| token_symbol | counts | |
|---|---|---|
| 0 | WETH | 163475 |
| 1 | USDC | 28853 |
| 2 | USDT | 13159 |
| 3 | DAI | 4178 |
| 4 | Vine | 2619 |
| 5 | WBTC | 2486 |
| 6 | SHIB | 1969 |
| 7 | LMEOW | 1494 |
| 8 | HEX | 1382 |
| 9 | $TRUMP | 1325 |
| 10 | Error in symbol | 1307 |
| 11 | DC | 1200 |
| 12 | TITTER | 1175 |
| 13 | TAMA | 1161 |
| 14 | XEN | 978 |
sns.barplot(data=bought_tokens_counts.head(15), x = "counts", y="token_symbol");
VITALIK_BUTERIN_ADDRESS = "0xab5801a7d398351b8be11c439e05c5b3259aec9b"
# You can change the values of the three variables below if you want to experiment with different values.
START_DATE_STR = "2022-01-01"
END_DATE_STR = "2022-12-31"
# Do not change code below
START_DATETIME = datetime.strptime(START_DATE_STR, '%Y-%m-%d')
END_DATETIME = datetime.strptime(END_DATE_STR, '%Y-%m-%d')
WALLET_ADDRESSES = [VITALIK_BUTERIN_ADDRESS]
vb_trades_retriever = DEXTradesRetriever(api_key=BITQUERY_API_KEY, start_date=START_DATETIME, end_date=END_DATETIME,
wallet_addresses=WALLET_ADDRESSES)
vb_df_dict = vb_trades_retriever.transform()
2022-11-02 13:35:50 [INFO] root - Retrieving number of DEX transactions per protocol data... 2022-11-02 13:35:51 [INFO] root - Retrieving DEX transactions all available protocols, from 2022-01-01 to 2022-12-31... 2022-11-02 13:35:51 [WARNING] root - - Waiting for 5.5 seconds until next request to the Bitquery Api due to free plan limits. 2022-11-02 13:35:58 [WARNING] root - - Waiting for 4.6 seconds until next request to the Bitquery Api due to free plan limits. 2022-11-02 13:36:05 [INFO] root - Finished in 2 iterations.
vb_df_dict.keys()
dict_keys(['DEX_Trades', 'ProtocolDexTrCounts'])
vb_dex_trades = vb_df_dict["DEX_Trades"]
vb_dex_trades_by_protocol = vb_df_dict["ProtocolDexTrCounts"]
vb_dex_trades.head()
| transaction_from_address | datetime | protocol | total_trade_amount_usd | sold_token_symbol | sold_tokens | bought_tokens | bought_token_symbol | transaction_gas_value | bought_token_address | sold_token_address | transaction_hash | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0x24f8e4f5188f63a882e453d10636a0112fd6c7e1 | 2022-01-04 14:09:18 | Uniswap v2 | 76962.197266 | FOMO | 1.482848e+12 | 20.000 | WETH | 0.012183 | 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2 | 0xb68f32410a7dd4cf7a1ae18c6b6ddefa2eed80b3 | 0xdea19cb511080b06ec705105abb168609066d2a29a2a8e5082a6d4d6db758ea8 |
| 1 | 0xf5c0f8388d32ee34deb0e34fa9eff279f54448c4 | 2022-01-06 21:20:25 | Uniswap v2 | 13721.250977 | SHIBA DAO | 6.922182e+10 | 4.000 | WETH | 0.015196 | 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2 | 0xf1656ae9e8227da5ebb93406e2edd74d9820d0ad | 0xe4a50708b27563b031d2f16b2535c6f08e22bb9a7cbec171d2b90a4e5599a9fa |
| 2 | 0x6081bb069380cd45753b21857b259a0be76f23e7 | 2022-01-08 06:26:52 | Uniswap v2 | 4826.196533 | RON | 5.095918e+09 | 1.500 | WETH | 0.009534 | 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2 | 0xebfc9d08f47b6b4411eaf5b97f1fd7d9b6d2929c | 0xe3e93539f72e1e6695b6fc39b4bdd99e587cc768b24bfd3464ba7e4023105052 |
| 3 | 0x7699833498006a7a8f42835a842e124c8f9a3ae3 | 2022-01-10 05:05:07 | Uniswap v2 | 5949.725798 | WEB3.0 | 1.253301e+10 | 1.873 | WETH | 0.011379 | 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2 | 0xd1bfb50bcb96635cf13fe50ba507ebbf37b09234 | 0xbc74997dccb810a6819fa9fe3cef44b8f1b73b96cad0d3a5903f7ffe54589d4f |
| 4 | 0x012c192653fefdf2b4626011ae8cecab40c2ed86 | 2022-01-12 11:59:53 | Uniswap v2 | 27681.426636 | LOVE | 6.738590e+10 | 8.500 | WETH | 0.014988 | 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2 | 0x2efc83c953f7a6a059744270c358edd6162f89d7 | 0x4ca1f1184e5150992f463107378c510f22245921ecd76d60d12f56be9a6e4689 |
sold_tokens_counts = vb_dex_trades["sold_token_symbol"].value_counts().reset_index(name="counts").rename(columns={"index": "token_symbol"})
sold_tokens_counts.head(10)
| token_symbol | counts | |
|---|---|---|
| 0 | SHIB | 15 |
| 1 | GAL | 2 |
| 2 | USDC | 2 |
| 3 | Ape DAO | 2 |
| 4 | UkraineDAO | 2 |
| 5 | $NOTHING | 1 |
| 6 | GOAL | 1 |
| 7 | ROPs | 1 |
| 8 | VYPER | 1 |
| 9 | DETH | 1 |
sns.barplot(data=sold_tokens_counts.query('counts > 1'), x = "counts", y="token_symbol");
bought_tokens_counts = vb_dex_trades["bought_token_symbol"].value_counts().reset_index(name="counts").rename(columns={"index": "token_symbol"})
bought_tokens_counts.head(10)
| token_symbol | counts | |
|---|---|---|
| 0 | WETH | 61 |
sns.barplot(data=bought_tokens_counts.query('counts > 1'), x = "counts", y="token_symbol");
The goal of this exercise is to obtain, using the Bitquery API, all Normal transactions where given address is participating (either as 'To' or 'From' address) and for anarbitrary time range [start_date, end_date].
In section 3.1.2 we did it using the start block and end block numbers, which gave the particular case of transactions for the date 2022-10-26. The goal now is to use dates instead of block numbers, and generalize to a range of dates.
You should write a function similar to the get_transactions_query function, now using two new parameters start_date and end_date, not block numbers. Tip: Use the Bitquery IDE to build the query and look for some "date" filter field.
Retrieve all transactions from January 1st 2022 to October 1st 2022 for the Vitalik Buterin address (0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B). Create the following plots:
Extra: Try the same exercise but using datetime ranges (i.e., include Hour,Minut, Second precision) instead of just a raw date (i.e., year-month-date).
# Code your function below
# Check your function for transactions where the 'To' address is the input address
# <Replace this with your code>
# Check your function for transactions where the 'From' address is the input address
# <Replace this with your code>
# Retrieve transactions and store as pandas dataframe
# Preprocess and prepare the dataframes for visualizations
# Count plot
# Line plot
The goal of this exercise is to reproduce Section 3.1.3 where we used the pycoingecko Python client as a handler of the Coingecko API to retrieve data, but now do the same from scratch, using one or more endpoints from the Coingecko API.
Explore the endpoints of the Coingecko API Coingecko API and see which of them (It may be just one endpoint) allows you to retrieve the necessary data to reproduce Section 3.1.3.
Retrieve data and build the corresponding dataframe.
Make the historical coin price, market cap and 24h volum visualizations in DOLLAR and EURO.
Tip: You may need to use a similar function to make requests to the Coingecko API as we did when using Etherscan API in section 3.1.1 but now without needing an API key.
Extra: You may explore more historical coin related data features and make additional plots for them.
# Create cells and add code as needed
The goal of this usecase exercise is to obtain statistics of which Tokens/Coins have been traded more frequently (sold and/or bought) in the last week (or an arbitrary week starting from a monday to a sunday) and in DEX (Decentralized Exchanges).
This would provide insights of which tokens/coins are currently being bought or sold by the community. Trades consist on interchanging one cryptocurrency by another. For example, I could trade (sell) 1 ETH for an equivalent quantity of "BTC" (buy), where the quantity bought would be calculated using the current price of BTC and accounting also for the Fees payed to do the transactions involved in that trade.
Exercise: Use the Bitquery API to:
Tips:
# Create cells and add code as needed
The goal of this usecase exercise is to totally or partially reproduce the dashboard shown in this Dune-app Dashboard to shows an overview of the 'Biggest Moves' in the Ethereum ecosystem in a particular timeframe (default 1 day, top 10)
Exercise: Investagate how to obtain at least one of the tables listed above (for example, Largest ETH amounts moved or Largest DEX Trades) using some public API (Bitquery, Etherscan,...).
# Create cells and add code as needed
For now, no solutions will be provided